Hive 29574 merge join poc#6456
Conversation
|
abstractdog
left a comment
There was a problem hiding this comment.
nice patch @illiabarbashov-sketch so far, let some comments
| skewedKeyFlagged = new boolean[maxAlias]; | ||
| } | ||
|
|
||
| public boolean isActive() { |
There was a problem hiding this comment.
can be private, or package-protected @VisibleForTesting if unit tests need it
| SET hive.vectorized.execution.enabled=false; | ||
| set hive.mapred.mode=nonstrict; | ||
| set hive.explain.user=false; | ||
| set hive.cbo.enable=false; |
There was a problem hiding this comment.
cbo should be enabled, we can simply delete this
| SET hive.vectorized.execution.enabled=false; | ||
| set hive.mapred.mode=nonstrict; | ||
| set hive.explain.user=false; | ||
| set hive.cbo.enable=false; |
There was a problem hiding this comment.
cbo should be enabled, we can simply delete this
| @@ -0,0 +1,35 @@ | |||
| SET hive.vectorized.execution.enabled=false; | |||
There was a problem hiding this comment.
vectorization should be enabled, we can simple delete this
| @@ -0,0 +1,20 @@ | |||
| SET hive.vectorized.execution.enabled=false; | |||
There was a problem hiding this comment.
vectorization should be enabled, we can simple delete this
| set hive.explain.user=false; | ||
| set hive.cbo.enable=true; | ||
| set hive.auto.convert.join=false; | ||
| set hive.optimize.ppd=false; |
There was a problem hiding this comment.
is hive.optimize.ppd=false needed?
| @@ -0,0 +1,35 @@ | |||
| SET hive.vectorized.execution.enabled=true; | |||
| set hive.mapred.mode=nonstrict; | |||
There was a problem hiding this comment.
is hive.mapred.mode=nonstrict needed?
| @@ -0,0 +1,20 @@ | |||
| SET hive.vectorized.execution.enabled=true; | |||
| set hive.mapred.mode=nonstrict; | |||
There was a problem hiding this comment.
is hive.mapred.mode=nonstrict needed?
| HIVE_JOIN_CACHE_SIZE("hive.join.cache.size", 25000, | ||
| "How many rows in the joining tables (except the streaming table) should be cached in memory."), | ||
| HIVE_MERGE_JOIN_SKEW_THRESHOLD("hive.merge.join.skew.threshold", -1L, | ||
| "Maximum number of rows allowed per join key in a single Tez sort-merge join task before a " |
There was a problem hiding this comment.
we can remove "Tez"
even "Tez" is the only execution engine we're currently supporting, this feature is theoretically orthogonal to that, so this is rather "Maximum number of rows allowed per join key in a single sort-merge reducer join task before..."
| "Maximum number of rows allowed per join key in a single Tez sort-merge join task before a " | ||
| + "skew event is reported."), | ||
| HIVE_MERGE_JOIN_SKEW_ABORT("hive.merge.join.skew.abort", false, | ||
| "When set to true and the row count is equal to hive.merge.join.skew.threshold, the Tez task will be aborted."), |
There was a problem hiding this comment.
maybe remove "Tez" from here too



What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
No
How was this patch tested?